Table of Contents:¶

  • Set 1 Analysis: Unicast vs Multicast
    • Introduction
    • Configuration
    • Latency Data Summary
    • Latency CDFs Per Run
    • Throughput Data Summary
    • Throughput CDFs Per Run
  • Set 2 Analysis: Unicast vs Multicast For Increasing Participants
    • Introduction
    • Test Configuration
    • Latency CDFs
  • Set 3 Analysis: Distributed Denial of Service
    • Introduction
    • 50 Participant Comparison
      • Latency CDFs Per Data Length
    • 100 Participant Comparison
      • Latency CDFs Per Data Length

Set 1 Analysis: Unicast vs Multicast ¶

Introduction ¶

So....the foundation. The start. The base case. What is the focus of this set of tests? It's just to act as a base case to compare other test results to but at the same time it's also somewhat of a calibration. To understand if the results are fine. To check if the system is working as it should be and if the results make sense.

Configuration ¶

Configuration Values Notes
Participant Amount 3P + 3S P: Publishers S: Subscribers
Publisher Allocation 1, 1, 1, 0 VM1, VM2, VM3, VM4
Subscriber Allocation 1, 1, 1, 0 VM1, VM2, VM3, VM4
Data Length 100B
Test Duration 900S
Test Type throughput Choices were throughput or latency.
Latency Count 1000 Number of packets between each latency measurement packet.
Reliability reliable Choices were reliable or best effort.
Communication Method unicast / multicast unicast is one to one communication whilst multicast is one to many.
Network Transport UDPv4 As opposed to the default of sharedmem where participants on the same machine don't require samples to travel to routing device.

So...starting off...what do we want to see? What do we need to show? We have the latency and throughput where the latency is per publisher and the throughput is per subscriber so there is a lot more throughput data than latency. Before we graph anything we should summarise the data.

We have 2 different test configurations and 3 runs of each one. So we have 6 instances of results. The following configurations were run:

  1. Unicast
  2. Multicast

Let's take a look at the result files:

  1. Unicast
    • average_latencies.csv
    • sub_0_output_average_throughputs.csv
    • sub_1_output_average_throughputs.csv
    • sub_2_output_average_throughputs.csv
  2. Multicast
    • average_latencies.csv
    • sub_0_output_average_throughputs.csv
    • sub_1_output_average_throughputs.csv
    • sub_2_output_average_throughputs.csv

For each test we have the average latency and average throughputs.

Given this, let's start off by plotting the CDFs of the latencies for each run.

Latency Data Summary ¶

In [ ]:
from all_functions import *
from set_1_functions import *
In [ ]:
ucast = [file for file in get_files('data/v1/set_1') if 'average_latencies' in file and 'unicast' in file]
mcast = [file for file in get_files('data/v1/set_1') if 'average_latencies' in file and 'multicast' in file]

# test_log = [file for file in get_files("data") if 'network_log' in file and '.log' in file][1]

plot_lat_summary_table(ucast)
plot_lat_summary_table(mcast)

Latency CDFs Per Run ¶

In [ ]:
files = [file for file in get_files('data/set_1') if 'average' in file and 'forced' in file]
raw_lat_files = [file for file in files if 'average_latencies' in file]
avg_lats = { 'unicast': raw_lat_files[0], 'multicast': raw_lat_files[1] }

fig, ax = plt.subplots(figsize=(20, 10))

df = pd.read_csv(avg_lats['unicast'])
combined_df = pd.concat([ df['run_1_latency'], df['run_2_latency'], df['run_3_latency'] ])
plot_cdf('', df['run_1_latency'], ax, greens[0], 'latency')
plot_cdf('', df['run_2_latency'], ax, greens[2], 'latency')
plot_cdf('', df['run_3_latency'], ax, greens[4], 'latency')
plot_cdf('Unicast Average', combined_df, ax, greens[0], 'average')
ax.axvline(combined_df.mean(), 0, 1, ls="--", color=blues[0], label="Unicast Mean")
unicast_mean = combined_df.mean()

df = pd.read_csv(avg_lats['multicast'])
combined_df = pd.concat([ df['run_1_latency'], df['run_2_latency'], df['run_3_latency'] ])
plot_cdf('', df['run_1_latency'], ax, reds[0], 'latency')
plot_cdf('', df['run_2_latency'], ax, reds[2], 'latency')
plot_cdf('', df['run_3_latency'], ax, reds[4], 'latency')
plot_cdf('Multicast Average', combined_df, ax, reds[0], 'average')
ax.axvline(combined_df.mean(), 0, 1, ls="--", color=oranges[0], label="Multicast Mean")

ax.annotate("+" + format_number(get_percent_diff(combined_df.mean(), unicast_mean)) + "%", (unicast_mean, 0.1), (combined_df.mean() + 5, 0.1), arrowprops={"arrowstyle": "<-", "color": "black"}, fontweight='bold')

_ = ax.get_figure().suptitle("Unicast vs Multicast: Latency CDFs Over 3 15-Minute Runs", fontsize=15, fontweight='bold')
_ = ax.legend()
ax.set_xlabel("Latency ($\mu$s)", fontsize=12, fontweight="bold")
ax.set_ylim(ymin=0, ymax=1)
ax.set_xlim(xmin=100, xmax=1500)
ax.set_xticks(list(ax.get_xticks()) + [unicast_mean, combined_df.mean(), ])
ax.grid()

plt.tight_layout()
  • Unicast has overall better latency
  • Multicast increases latency average by 16%
  • Unicast average is 482us
  • Multicast average is 560us

Throughput Data Summary ¶

In [ ]:
tp_files = [file for file in get_files('data/set_1') if 'sub' in file and '.csv' in file and 'average' in file and 'forced_transport' in file]
ucast_files = [file for file in tp_files if 'unicast' in file]
mcast_files = [file for file in tp_files if 'multicast' in file]

sub_1¶

In [ ]:
# Get unicast latency files
ucast_lats = [file for file in get_files('data/set_1') if 'average' in file and 'throughput' in file and 'sub_1' in file and 'unicast' in file][0]
mcast_lats = [file for file in get_files('data/set_1') if 'average' in file and 'throughput' in file and 'sub_1' in file and 'multicast' in file][0]
plot_summary_table(ucast_lats, mcast_lats)

sub_2¶

In [ ]:
# Get unicast latency files
ucast = [file for file in get_files('data/set_1') if 'average' in file and 'throughput' in file and 'sub_2' in file and 'unicast' in file][0]
mcast = [file for file in get_files('data/set_1') if 'average' in file and 'throughput' in file and 'sub_2' in file and 'multicast' in file][0]
plot_summary_table(ucast, mcast)

Throughput CDFs Per Run (Unicast vs Multicast) ¶

In [ ]:
tp_files = [file for file in files if 'throughput' in file]

avg_tps = {
    'unicast': [file for file in tp_files if 'unicast' in file],
    'multicast': [file for file in tp_files if 'multicast' in file]
}

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(25, 10))

axes[0].set_title("Sub 0 Throughput (Unicast vs Multicast)")
axes[1].set_title("Sub 1 Throughput (Unicast vs Multicast)")
axes[2].set_title("Sub 2 Throughput (Unicast vs Multicast)")

plot_unicast_tp_cdfs(avg_tps['unicast'], axes)
plot_multicast_tp_cdfs(avg_tps['multicast'], axes)

for ax in axes:
    ax.legend()
    ax.grid()
    ax.set_xlabel("Throughput (mbps)")
    ax.set_xlim(xmin=40, xmax=80)
    ax.set_ylim(ymin=0, ymax=1)
plt.tight_layout()
  • Multicast generally has better overall throughput

Set 2 Analysis: Unicast vs Multicast For Increasing Participants ¶

Table of Contents¶

  • Introduction
  • Test Configuration
  • Latency CDFs
  • Latency CDFs Per Participant Amount
  • Throughput CDFs
  • Throughput CDFs Per Participant Amount

Introduction ¶

For Set 2 we change the number of participants whilst comparing Unicast performance with Multicast. This is mainly to see at what point does Multicast outperform Unicast.

In theory, we know that Multicast has 2 "hops" with any amount of participants. This is 1 hop from the publisher to the multicast router and another hope from multicast router to all participants. Meanwhile, unicast has 2n hops for n amount of participants where each participant receives the data sequentially. Therefore, generally speaking, multicast should outperform unicast. However, the data speaks differently...

Test Configuration ¶

So we defined some tests to see how the performance is affected when increasing the participants with the following configurations:

Configuration Values Notes
Participant Amount 10P + 10S, 25P + 25S, 50P + 50S, 100P + 100S P: Publishers. S: Subscribers.
Publisher Allocation [3, 2, 3, 2], [6, 7, 6, 7], [12, 13, 12, 13], [25, 25, 25, 25] Test 1, Test 2, Test 3, Test 4 [VM1, VM2, VM3, VM4]
Subscriber Allocation [3, 2, 3, 2], [6, 7, 6, 7], [12, 13, 12, 13], [25, 25, 25, 25] Test 1, Test 2, Test 3, Test 4 [VM1, VM2, VM3, VM4]
Data Length 100B
Test Duration 900S
Test Type throughput Choices were throughput or latency.
Latency Count 1000 Number of packtes between each latency measurement packet.
Reliability reliable Choices were reliable or best effort.
Communication Method unicast / multicast unicast is one to one communication while multicast is one to many.
Network Transport UDPv4 As opposed to the default of sharedmem where participants on the same machine don't require samples to travel to routing device.

Basically, we varied unicast and multicast for 10P + 10S, 25P + 25S, 50P + 50S, and 100P + 100S.

For each of the tests (10P + 10S, ..., 100P + 100S) we have the following results as files (applies for both unicast and multicast):

    - average_latencies.csv
    - sub_0_output_average_throughput.csv
    - ...
    - sub_n_output_average_throughput.csv

Therefore, the next step is actually analysing the results. We start off looking at the CDFs of the latencies:

Results Summary¶

In [ ]:
s2_plot_latency_summary_tables()

Latency CDFs ¶

In [ ]:
set2_plot_latency_cdfs()

Latency CDFs Per Participant ¶

In [ ]:
def set2_plot_latency_cdfs_per_participant():

    fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(35, 60))
    fig.suptitle("Latency CDFs Per Participant (Unicast vs Multicast)", fontsize=20, fontweight='bold')

    s2_lats = [file for file in get_files('data/set_2') if 'average_latencies' in file and '_4_' not in file and 'forced_transport' in file]

    s2_ucast_lats = [file for file in s2_lats if 'unicast' in file]
    s2_ucast_lats.sort()

    for file in s2_ucast_lats:
        i = s2_ucast_lats.index(file)
        df = pd.read_csv(file)
        combined_df = pd.concat([df['run_1_latency'], df['run_2_latency'], df['run_3_latency']])
        ax = axes[i]
        
        ax.set_ylim(ymin=0)
        ax.set_xlabel("Latency ($\mu$s)")

        if i == 0:
            # ax.set_xlim(xmin=0, xmax=15000)
            ax.set_title("10P + 10S", fontsize=15, fontweight='bold')
        elif i == 1:
            # ax.set_xlim(xmin=0, xmax=100000)
            ax.set_title("25P + 25S", fontsize=15, fontweight='bold')
        elif i == 2:
            # ax.set_xlim(xmin=0, xmax=300000)
            ax.set_title("50P + 50S", fontsize=15, fontweight='bold')
        # elif i == 3:
        #     ax.set_xlim(xmin=0, xmax=600000)
        #     ax.set_title("75P + 75S", fontsize=15, fontweight='bold')

        plot_cdf("", df['run_1_latency'], ax, greens[0], 'normal')
        plot_cdf("", df['run_2_latency'], ax, greens[0], 'normal')
        plot_cdf("", df['run_3_latency'], ax, greens[0], 'normal')
        plot_cdf(get_test_names([file])[0] + " Average", combined_df, ax, greens[0], 'average')

    s2_mcast_lats = [file for file in s2_lats if 'multicast' in file]
    s2_mcast_lats.sort()
    s2_mcast_lats.append(s2_mcast_lats.pop(0))

    for file in s2_mcast_lats:
        i = s2_mcast_lats.index(file)
        df = pd.read_csv(file)
        combined_df = pd.concat([df['run_1_latency'], df['run_2_latency'], df['run_3_latency']])
        ax = axes[i]
        
        ax.set_ylim(ymin=0)
        ax.set_xlabel("Latency ($\mu$s)")

        # if i == 0:
            # ax.set_xlim(xmin=0, xmax=15000)
            # ax.set_title("10P + 10S (Multicast)", fontsize=15, fontweight='bold', color=reds[0])
        # elif i == 1:
            # ax.set_xlim(xmin=0, xmax=120000)
            # ax.set_title("25P + 25S (Multicast)", fontsize=15, fontweight='bold', color=reds[0])
        # elif i == 2:
            # ax.set_xlim(xmin=0, xmax=300000)
            # ax.set_title("50P + 50S (Multicast)", fontsize=15, fontweight='bold', color=reds[0])
        # elif i == 3:
            # ax.set_xlim(xmin=0, xmax=600000)
            # ax.set_title("75P + 75S", fontsize=15, fontweight='bold')

        plot_cdf("", df['run_1_latency'], ax, reds[0], 'normal')
        plot_cdf("", df['run_2_latency'], ax, reds[0], 'normal')
        plot_cdf("", df['run_3_latency'], ax, reds[0], 'normal')
        plot_cdf(get_test_names([file])[0] + " Average", combined_df, ax, reds[0], 'average')

    for ax in axes:
        ax.grid()
        ax.legend(loc=2)

    plt.tight_layout(pad=5)

    print(s2_ucast_lats)
    print(s2_mcast_lats)

set2_plot_latency_cdfs_per_participant()
[
    'data/set_2\\2_1_participant_measure_unicast_1_forced_transport\\average_latencies.csv',
    'data/set_2\\2_2_participant_measure_unicast_2_forced_transport\\average_latencies.csv',
    'data/set_2\\2_3_participant_measure_unicast_3_forced_transport\\average_latencies.csv'
]
[
    'data/set_2\\2_6_participant_measure_multicast_2_forced_transport\\average_latencies.csv'
,
    'data/set_2\\2_7_participant_measure_multicast_3_forced_transport\\average_latencies.csv'
,
    'data/set_2\\2_5_participant_measure_multicast_1_forced_transport\\average_latencies.csv'
]

Latency Average Per Participant Amount¶

In [ ]:
s2_plot_latency_avg_per_participant()

Throughput CDFs ¶

In [ ]:
set2_plot_tp_cdfs()

Set 2 Rerun Analysis: Unicast vs Multicast ¶

Introduction ¶

When looking at the latency results of Set 2 we noticed a lot of variation between runs of the same test. Now this was strange because everything should have been the same in terms of the test parameters for each run. It would work by first restarting the VM, then running the test (run 1), then restarting the VM, then running the test (run 2), then restarting the VM, then running the test for the final time (run 3).

The variation between runs would increase corresponding with the increase in participant amount. Below is a bar chart demonstrating the variation between the averages of each run:

Average Latency Variation Per Participant Amount¶

In [ ]:
s2_rerun_plot_latency_variation_per_participant()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:\Users\acwh025\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py:3361, in Index.get_loc(self, key, method, tolerance)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3359'>3360</a> try:
-> <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3360'>3361</a>     return self._engine.get_loc(casted_key)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3361'>3362</a> except KeyError as err:

File c:\Users\acwh025\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\index.pyx:76, in pandas._libs.index.IndexEngine.get_loc()

File c:\Users\acwh025\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\index.pyx:108, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'run_3_latency'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
c:\Users\acwh025\OneDrive - City, University of London\PhD\PAT\PATS Data Processing\Complete Analysis.ipynb Cell 34' in <module>
----> <a href='vscode-notebook-cell:/c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/Complete%20Analysis.ipynb#ch0000033?line=0'>1</a> s2_rerun_plot_latency_variation_per_participant()

File c:\Users\acwh025\OneDrive - City, University of London\PhD\PAT\PATS Data Processing\all_functions.py:922, in s2_rerun_plot_latency_variation_per_participant()
    <a href='file:///c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/all_functions.py?line=919'>920</a>     to_append["run0"].append(mdf["run_1_latency"].mean())
    <a href='file:///c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/all_functions.py?line=920'>921</a>     to_append["run1"].append(mdf["run_2_latency"].mean())
--> <a href='file:///c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/all_functions.py?line=921'>922</a>     to_append["run2"].append(mdf["run_3_latency"].mean())
    <a href='file:///c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/all_functions.py?line=922'>923</a> elif '_5_75_participants' in ucast_file:
    <a href='file:///c%3A/Users/acwh025/OneDrive%20-%20City%2C%20University%20of%20London/PhD/PAT/PATS%20Data%20Processing/all_functions.py?line=923'>924</a>     to_append = data["ucast"]

File c:\Users\acwh025\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:3458, in DataFrame.__getitem__(self, key)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/frame.py?line=3455'>3456</a> if self.columns.nlevels > 1:
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/frame.py?line=3456'>3457</a>     return self._getitem_multilevel(key)
-> <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/frame.py?line=3457'>3458</a> indexer = self.columns.get_loc(key)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/frame.py?line=3458'>3459</a> if is_integer(indexer):
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/frame.py?line=3459'>3460</a>     indexer = [indexer]

File c:\Users\acwh025\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py:3363, in Index.get_loc(self, key, method, tolerance)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3360'>3361</a>         return self._engine.get_loc(casted_key)
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3361'>3362</a>     except KeyError as err:
-> <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3362'>3363</a>         raise KeyError(key) from err
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3364'>3365</a> if is_scalar(key) and isna(key) and not self.hasnans:
   <a href='file:///c%3A/Users/acwh025/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py?line=3365'>3366</a>     raise KeyError(key)

KeyError: 'run_3_latency'

As seen in the bar chart above:

  • latency variation for 10P + 10S is minimal
  • latency variation for 25P + 25S Unicast is quite visible especiall for run 2
    • run 2 avg is almost double run 1
  • latency variation for 25P + 25S Multicast is visible
    • there is a 20% difference between run 1 and run 3
  • latency variation for 50P + 50S is large
    • Unicast run 1 is almost double run 3
    • Multicast variation is not as large as Unicast
  • latency variation for 75P + 75S is quite large too

Therefore, we want to investigate why exactly there is this variation between runs despite all known variables being constant. We reran the Set 2 test but this time we tracked the CPU and network usage. Below we first plot the latency CDFs per participant amount on the left and then we plot the network usage on the right of it.

Latency CDFs vs Network Usage¶

Latency/Throughput vs Network Usage Per VM¶

10P + 10S Latency/Throughput vs Network Usage Per VM (Unicast)¶

In [ ]:
lat_tp_vs_net_per_vm_10p10s_unicast()

10P + 10S Latency/Throughput vs Network Usage Per VM (Multicast)¶

In [ ]:
lat_tp_vs_net_per_vm_10p10s_multicast()

25P + 25S Latency/Throughput vs Network Usage Per VM (Unicast)¶

In [ ]:
lat_tp_vs_net_per_vm_25p25s_unicast()

25P + 25S Latency/Throughput vs Network Usage Per VM (Multicast)¶

In [ ]:
lat_tp_vs_net_per_vm_25p25s_multicast()

50P + 50S Latency/Throughput vs Network Usage Per VM (Unicast)¶

In [ ]:
lat_tp_vs_net_per_vm_50p50s_unicast()

50P + 50S Latency/Throughput vs Network Usage Per VM (Multicast)¶

In [ ]:
lat_tp_vs_net_per_vm_50p50s_multicast()

75P + 75S Latency/Throughput vs Network Usage Per VM (Unicast)¶

In [ ]:
lat_tp_vs_net_per_vm_75p75s_unicast()

75P + 75S Latency/Throughput vs Network Usage Per VM (Multicast)¶

In [ ]:
lat_tp_vs_net_per_vm_75p75s_multicast()

Set 2 24-hour Run Analysis¶

Because the variations between runs in Set 2 was visible we decided to run some tests for longer amount of time to produce a larger sample and we hope that there is less variation in the runs as the expected value is expected to converge with all runs much better due to the the larger sample size.

We start off by mapping 75P + 75S Unicast/Multicast. On the left we have the old CDFs with 15-minute runs and on the right we have the 24-hour runs.

Results Summary¶

In [ ]:
avg_lat_24hr = [file for file in get_files("data/set_2") if '24_hours' in file and 'average_latencies' in file]

plot_lat_summary_table(avg_lat_24hr)

Latency CDF Comparison¶

In [ ]:
s2_24hr15m_cdf_comparison()
In [ ]:
s2_24hr15m_cdf_comparison_single_plot()

Set 3 Analysis: Distributed Denial of Service ¶

Introduction ¶

This set of tests focuses on the performance change when DDS is under a DDOS attack. We have two amounts of participants that we can compare with Set 2.

In Set 2 we had:

  1. 25 pubs + 25 subs: 50 participants
  2. 50 pubs + 50 subs: 100 participants

In Set 3 we have:

  1. 25 pubs + 25 mal pubs + 25 subs + 25 mal subs: 100 participants
  2. 50 pubs + 50 mal pubs + 50 subs + 50 mal subs: 200 participants

Therefore, we can compare Set 2 (1.) with Set 3 (1.) and Set 2 (2.) with Set 3 (2.) since in both cases they have the same amount of legitimate participants. For all Set 3 results we combine all of the measurements into a single list and plot these into CDFs since we are not focusing on the difference between specific runs.

For the DDOS attacks we have taken half of the participants and placed them in a separate domain where they exchange samples of the following ranges:

  • 300 bytes
  • 500 bytes
  • 1 kilobyte
  • 16 kilobytes
  • 64 kilobytes
  • 128 kilobytes
  • 512 kilobytes
  • 1 megabyte

Therefore, we first look at the comparison between Set 2 and Set 3 with 50 legitimate participants before looking at the comparison between Set 2 and Set 3 with 100 legitimate participants.

50 Participant Comparison ¶

Latency CDFs Per Data Length ¶

In [ ]:
def plot_mini_cdf(ax, title, ddos_size, normal_files, ddos_files):
    ax.set_title(label=title, fontsize=12, fontweight="bold")
    for file in ddos_files[ddos_size]:
        df = pd.read_csv(file)
        if 'run_1_latency' in df and 'run_2_latency' in df and 'run_3_latency' in df:
            combined_df = pd.concat([ df["run_1_latency"], df["run_2_latency"], df["run_3_latency"] ])
        else:
            combined_df = df['run_1_latency']
        if 'unicast' in file:
            # plot_cdf('', df["run_1_latency"], ax, greens[0], 'normal')
            # plot_cdf('', df["run_2_latency"], ax, greens[0], 'normal')
            # plot_cdf('', df["run_3_latency"], ax, greens[0], 'normal')
            plot_cdf(ddos_size.upper() + " DDOS Unicast DDS Latency", combined_df, ax, greens[0], 'average')
            ax.text(50000, (combined_df.mean() / 200000), "Avg.: " + format_number(combined_df.mean()) + "$\mu$s", color=greens[0], backgroundcolor='white', fontweight='bold', fontsize=10)
            ax.annotate('', (60000, (combined_df.mean() / 200000)), (60000, 25934/200000), arrowprops={"arrowstyle": "->", "color": greens[0]})
            ax.text(40000, 0.2, "+" + format_number(get_percent_diff(combined_df.mean(), 25934)) + "%", fontweight='bold', color=greens[0])
        else:
            # plot_cdf('', df["run_1_latency"], ax, reds[0], 'normal')
            # plot_cdf('', df["run_2_latency"], ax, reds[0], 'normal')
            # plot_cdf('', df["run_3_latency"], ax, reds[0], 'normal')
            plot_cdf(ddos_size.upper() + " DDOS Multicast DDS Latency", combined_df, ax, reds[0], 'average')
            ax.text(90000, (combined_df.mean() / 200000), "Avg.: " + format_number(combined_df.mean()) + "$\mu$s", color=reds[0], backgroundcolor='white', fontweight='bold', fontsize=10)
            ax.annotate('', (100000, (combined_df.mean() / 200000)), (100000, 29149/200000), arrowprops={"arrowstyle": "->", "color": reds[0]})
            ax.text(80000, 0.2, "+" + format_number(get_percent_diff(combined_df.mean(), 29149)) + "%", fontweight='bold', color=reds[0])

    for file in normal_files:
        df = pd.read_csv(file)
        if 'run_1_latency' in df and 'run_2_latency' in df and 'run_3_latency' in df:
            combined_df = pd.concat([ df["run_1_latency"], df["run_2_latency"], df["run_3_latency"] ])
        else:
            combined_df = df['run_1_latency']

        if 'unicast' in file:
            plot_cdf('Normal Unicast DDS Latency', combined_df, ax, blues[0], 'average')
            ax.text(50000, (combined_df.mean() / 200000), "Avg.: " + format_number(combined_df.mean()) + "$\mu$s", color=blues[0], backgroundcolor='white', fontweight='bold', fontsize=10)
        else:
            plot_cdf('Normal Multicast DDS Latency', combined_df, ax, oranges[0], 'average')
            ax.text(90000, (combined_df.mean() / 200000), "Avg.: " + format_number(combined_df.mean()) + "$\mu$s", color=oranges[0], backgroundcolor='white', fontweight='bold', fontsize=10)

def s3_plot_ddos_latency_cdf_comparison(type):
    """
    Get data for normal test (non-attack)
    """
    normal_files = [file for file in get_files("data/set_2") if 'average_latencies' in file and 'forced_transport' in file and ('unicast_2_' in file or 'multicast_2_' in file)]
    """
    Get data for ddos test
    """
    if 'rerun' in type:
        all_ddos_files = [file for file in get_files("data/set_3") if 'average_latencies' in file and 'rerun' in file and ('unicast_2_' in file or 'multicast_2_' in file)]
        plot_title = "DDOS Latency CDF Comparison of 6-Hour Tests"
    else:
        all_ddos_files = [file for file in get_files("data/set_3") if 'average_latencies' in file and 'rerun' not in file and ('unicast_2_' in file or 'multicast_2_' in file)]
        plot_title = "DDOS Latency CDF Comparison of 15-Minute Tests"
    
    # print(normal_files)

    ddos_files = {
        # "1mb": [file for file in all_ddos_files if '1024_kilobyte' in file],
        # "512kb": [file for file in all_ddos_files if '512_kilobyte' in file],
        "128kb": [file for file in all_ddos_files if '128_kilobyte' in file],
        "64kb": [file for file in all_ddos_files if '64_kilobyte' in file],
        # "16kb": [file for file in all_ddos_files if '16_kilobyte' in file]
    }

    fig = plt.figure(figsize=(30, 15))
    fig.suptitle(plot_title, fontsize=15, fontweight='bold')
    grid = plt.GridSpec(3, 3, figure=fig)

    combined = plt.subplot(grid[0:2,0:2])
    bot_left = plt.subplot(grid[2, 0])
    bot_mid = plt.subplot(grid[2, 1])
    top_right = plt.subplot(grid[0, 2])
    top_mid = plt.subplot(grid[1, 2])
    bot_right = plt.subplot(grid[2, 2])

    all = [combined, bot_left, bot_right, bot_mid, top_right, top_mid]

    """
    Plot combined latency graph
    """
    combined.set_title(label="Latency CDFs Combined", fontsize=12, fontweight='bold')
    combined.set_xlim(xmin=0, xmax=200000)
    for type in ddos_files:
        for file in ddos_files[type]:
            df = pd.read_csv(file)
            if '16_kilobyte' in file:
                title = "DDOS Latency"
            else:
                title = ""

            if 'run_1_latency' in df and 'run_2_latency' in df and 'run_3_latency' in df:
                combined_df = pd.concat([ df["run_1_latency"], df["run_2_latency"], df["run_3_latency"] ])
            else:
                combined_df = df['run_1_latency']
            
            if 'unicast' in file:
                title = title + " Unicast" if len(title) > 0 else ""
                plot_cdf(title, combined_df, combined, greens[0], 'average')
            else:
                title = title + " Multicast" if len(title) > 0 else ""
                plot_cdf(title, combined_df, combined, reds[0], 'average')
    for file in normal_files:
        df = pd.read_csv(file)
        if 'run_1_latency' in df and 'run_2_latency' in df and 'run_3_latency' in df:
            combined_df = pd.concat([ df["run_1_latency"], df["run_2_latency"], df["run_3_latency"] ])
        else:
            combined_df = df['run_1_latency']
        if 'unicast' in file:
            plot_cdf('Normal Unicast Latency', combined_df, combined, blues[0], 'average')
        else:
            plot_cdf('Normal Multicast Latency', combined_df, combined, oranges[0], 'average')

    # plot_mini_cdf(bot_left, "16KB DDOS Latency CDFs Per Run", '16kb', normal_files, ddos_files)
    plot_mini_cdf(bot_mid, "64KB DDOS Latency CDFs Per Run", '64kb', normal_files, ddos_files)
    plot_mini_cdf(bot_right, "128KB DDOS Latency CDFs Per Run", '128kb', normal_files, ddos_files)
    # plot_mini_cdf(top_mid, "512KB DDOS Latency CDFs Per Run", '512kb', normal_files, ddos_files)
    # plot_mini_cdf(top_right, "1MB DDOS Latency CDFs Per Run", '1mb', normal_files, ddos_files)

    for ax in all:
        ax.spines['right'].set_visible(False)
        ax.spines['top'].set_visible(False)
        ax.set_ylim(ymin=0, ymax=1)
        ax.grid()
        # ax.legend()
        ax.set_xlabel("Latency ($\mu$s)")
        ax.set_xlim(xmin=0, xmax=175000)

    plt.tight_layout()


s3_plot_ddos_latency_cdf_comparison('normal')
s3_plot_ddos_latency_cdf_comparison('rerun')

# print([file for file in get_files('data/set_2')])

100 Participant Comparison ¶

Latency CDFs Per Data Length ¶